Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add asv benchmark jobs to CI #5796

Merged
merged 123 commits into from
Oct 24, 2021
Merged

Conversation

Illviljan
Copy link
Contributor

@Illviljan Illviljan commented Sep 14, 2021

Workflow based on the version from scikit-image. Modfied to have asv.conf.json inside a subdirectory and triggers every push if the PR has the run-benchmark label.

Notes:

References:

Tests checked:

  • interp
  • pandas
  • repr
  • combine
  • datarray_missing
  • dataset_io - Skipped too difficult to understand. Some of the tests are possibly broken.
  • indexing
  • reindexing
  • rolling - Some division by 0 prints left.
  • unstacking

TODO:

  • self.setup_cache

@Illviljan Illviljan marked this pull request as draft September 14, 2021 22:02
@github-actions
Copy link
Contributor

github-actions bot commented Sep 14, 2021

Unit Test Results

         6 files           6 suites   1h 0m 16s ⏱️
16 226 tests 14 490 ✔️ 1 736 💤 0
90 552 runs  82 372 ✔️ 8 180 💤 0

Results for commit 70cd679.

♻️ This comment has been updated with latest results.

@max-sixty
Copy link
Collaborator

Awesome!! Thanks a lot @Illviljan !

On how we run these — I would be fine with the labeled approach that scikit-learn uses.

What else do you think is needed for this? It already looks excellent, thank you.

@Illviljan
Copy link
Contributor Author

Illviljan commented Sep 15, 2021

Here's some things besides not getting green ticks @max-sixty:

  • Long times, +240 minutes. scikit had it down to like 13min somehow. What's the bottleneck? Which tests are super slow?
  • A lot of printed errors which makes the report very messy, go through each test and fix those. Not necessary for this PR though.
  • asv.conf.json has been moved. If you have any idea how to trigger this job inside asv_bench that would be nice.
  • Should we align benchmark folder names? Numpy?
  • Add more required installs? bottleneck for examples crashes tests because it isn't installed. Add all the non-required as well?
  • Can a normal user add and remove labels? I set this up so that I don't have to deal with the nightmare of setting up asv locally. Would be nice to avoid having to ask you over and over again to add/remove labels. To trigger it.
    image

@dcherian
Copy link
Contributor

Long times, +240 minutes

yeah this is a problem.

Maybe we need to go through and mark the slow tests like in the README_ci.md document

In that vein, a new private function is defined at benchmarks.__init__: _skip_slow. This will check if the ASV_SKIP_SLOW environment variable has been defined. If set to 1, it will raise NotImplementedError and skip the test. To implement this behavior in other tests, you can add the following attribute:

Add more required installs? bottleneck for examples crashes tests because it isn't installed. Add all the non-required as well?

Do you mean skipping those that require bottleneck rather than relying on asv to handle the crash? If so, sounds good

Can a normal user add and remove labels?

An alternative approach would be to use something similar to our [skip-ci] and [test-upstream] tags in the commit message. Though I think that tag needs to be on every commit you want benchmarked.

@dcherian
Copy link
Contributor

Ah one major problem IIUC is that we run a lot of benchmarks with and without bottleneck even if bottleneck isn't involved in the operation being benchmarked.


Some of the following should be fixed


ImportError: Pandas requires version '0.12.3' or newer of 'xarray' (version '0.0.0' currently installed).


rolling.Rolling.time_rolling_construct
xarray.core.merge.MergeError: conflicting values for variable 'x_coords' on objects to be combined. You can skip this check by specifying compat='override'.


IOWriteNetCDFDaskDistributed.time_write
Looks like we're spinning up multiple dask clusters.
UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 33423 instead
warnings.warn(

@Illviljan
Copy link
Contributor Author

Illviljan commented Sep 15, 2021

Do you mean skipping those that require bottleneck rather than relying on asv to handle the crash? If so, sounds good

Hmm, I was just thinking installing all possible dependencies. But I think I've simply misunderstood the tests and why they were crashing.

The Benchmark no label triggered seems to have succeded but still failed? Anyone understands what the error means?

 [100.00%] ··· unstacking.UnstackingDask.time_unstack_slow             32.7±0.3ms

BENCHMARKS NOT SIGNIFICANTLY CHANGED.
+ grep 'Traceback \|failed\|PERFORMANCE DECREASED' benchmarks.log
+ exit 1
++ '[' 1 = 1 ']'
++ '[' -x /usr/bin/clear_console ']'
++ /usr/bin/clear_console -q
Error: Process completed with exit code 1.

Edit: I think I understand now. We get a bunch of Traceback errors, that's why it's erroring even though the performance didn't change.

@Illviljan
Copy link
Contributor Author

@dcherian did you have something specific in mind when you linked to https://github.com/jaimergp/scikit-image/blob/main/.github/workflows/benchmarks-cron.yml in #4648? I think I'll remove it otherwise and just focus on

  • Benchmark label triggered - from scikit
  • Benchmark no label triggered - from scikit modded by me.

@Illviljan
Copy link
Contributor Author

Illviljan commented Sep 15, 2021

There we go. I think the workflow works as intended now we'll see in 3-4 hours.
Only thing left is to improve the tests which can be done in other PRs and maybe rewrite that README slightly.

@Illviljan
Copy link
Contributor Author

Illviljan commented Oct 7, 2021

Here's how long dataarray_missing.py takes in this workflow with different shapes:
321a761 - shape=(100, 25, 25), 3 minutes
8f262f9 - shape=(365, 50, 50), 3m 28s
0b7b1a0 - shape=(365, 75, 75), 4m 8s
8f08506 - shape=(365, 100, 100), 5m 47s
d1b908a - shape=(365, 200, 400) , 12m 38s
56556f1 - shape=(3650, 100, 100) 19m 55s and crashes
1eba65c - shape=(3650, 200, 400), 20 minutes and crashes

Changed the shape to shape=(365, 75, 75) as that seems to be around tipping point where it starts slowing down.

@dcherian
Copy link
Contributor

Thanks @Illviljan this is great work!

@dcherian dcherian merged commit 26e2e61 into pydata:main Oct 24, 2021
dcherian added a commit to dcherian/xarray that referenced this pull request Oct 29, 2021
* upstream/main:
  Only run asv benchmark when labeled (pydata#5893)
  Add asv benchmark jobs to CI (pydata#5796)
  Remove use of deprecated `kind` argument in `CFTimeIndex` tests (pydata#5723)
  Single matplotlib import (pydata#5794)
  Check jupyter nbs with black in pre-commit (pydata#5891)
snowman2 pushed a commit to snowman2/xarray that referenced this pull request Feb 9, 2022
@Illviljan Illviljan deleted the asv-benchmark-cron branch August 12, 2022 09:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run-benchmark Run the ASV benchmark workflow
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants